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1. Introduction 

Let (Xi, . . . ,X n ) be a random vector in R™ with distribution fi. We study rates of 
approximation of the average marginal distribution function 



1 " 

F(x) = EF n (x) = -J2 P{Xi < x} 



n 
i=i 

by the empirical distribution function 

F n (x) = — card{£ < n:JQ < a;}, x € R. 

We shall measure the distance between F and F n by means of the (uniform) Kolmogorov 
metric ||F„ — F\\ = s\ip x \F n (x) — F(x)\, as well as by means of the L 1 -metric 



/ + OO 
\F n (x)-F(x)\dx 
-00 
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The latter, also called the Kantorovich-Rubinstein distance, may be interpreted as the 
minimal cost needed to transport the empirical measure F n to F with cost function 
d(x,y) = \x — y\ (the price paid to transport the point x to the point y). 

The classical example is the case where all Xi's are independent and identically 
distributed (i.i.d.), that is, when (i represents a product measure on R™ with equal 
marginals, say, F. If it has no atoms, the distributions of the random variables T n = 
\/n\\F n — F|j are weakly convergent to the Kolmogorov law. Moreover, by the Dvoretzky- 
Kiefer-Wolfowitz theorem, the r.v.'s T n are uniformly sub-Gaussian and, in particular, 
E||F„ — F|| < up to a universal factor C ([17]; cf. [32] for history and sharp bounds). 
This result, together with the related invariance principle, has a number of extensions to 
the case of dependent observations, mainly in terms of mixing conditions imposed on a 
stationary process; see, for example, [28, 38, 44]. 

On the other hand, the observations X\, . . . ,X n may also be generated by non-tri- 
vial functions of independent random variables. Of particular importance are random 
symmetric matrices (-^^ fc ), 1 < j, k < n, with i.i.d. entries above and on the diagonal. 
Arranging their eigenvalues Xi < ■ ■ ■ < X n in increasing order, we arrive at the spectral 
empirical measures F n . In this case, the mean F = E_F„ also depends on n and converges 
to the semicircle law under appropriate moment assumptions on ^ (cf., for example, 



The example of matrices strongly motivates the study of deviations of F n from the 
mean F under general analytical hypotheses on the joint distribution of the observations, 
such as Poincare or logarithmic Sobolev inequalities. A probability measure /i on R n is 
said to satisfy a Poincare-type or spectral gap inequality with constant a 2 (a > 0) if, for 
any bounded smooth function g on R™ with gradient Vg, 



In this case, we write PI(er 2 ) for short. Similarly, /i satisfies a logarithmic Sobolev in- 
equality with constant a 2 and we write LSI(cr 2 ) if, for all bounded smooth g, 



Here, as usual, Var^p) = J <? 2 d/i— (f gd[i) 2 stands for the variance of g and Ent p (g) = 
J g log g d/i — J g d/j log J g &[i denotes the entropy of g > under the measure /z. It is well 
known that LSI(cr 2 ) implies PI(cr 2 ). 

These hypotheses are crucial in the study of concentration of the spectral empirical 
distributions, especially of the linear functionals J fdF n with individual smooth / on 
the line; see, for example, the results by Guionnet and Zeitouni [24], Chattcrjcc and Bosc 
[14], Davidson and Szarek [15] and Ledoux [30]. A remarkable feature of this approach 
to spectral analysis is that no specific knowledge about the non-explicit mapping from 
a random matrix to its spectral empirical measure is required. Instead, one may use 
general Lipschitz properties only, which are satisfied by this mapping. As for the general 



[20, 36]). 




(1.1) 




(1.2) 
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(not necessarily matrix) scheme, we shall only require the hypotheses (1.1) or (1.2). In 
particular, we derive the following from (1.1). 

Theorem 1.1. Under PI(er 2 ) on R" (n>2), 

r+°o / * , j \ i/3 

E/ \F n (x)-F(x)\dx<Ca( + g , (1.3) 



where A — — max^j- \EXi — EXj\ and C is an absolute constant. 

Note that the Poincare-type inequality (1.1) is invariant under shifts of the measure 
[i, while the left-hand side of (1.3) is not. This is why the bound on the right-hand side 
of (1.3) should also depend on the means of the observations. 

In terms of the ordered statistics XI <■■• < X* of the random vector (X\, . . . ,X n ), 
there is a general two-sided estimate for the mean of the Kantorovich-Rubinstein dis- 
tance: 

^^E\X*-EX;\<EW 1 (F n ,F) < ^E\X*-EX*\ (1.4) 

i=l i=l 

(see remarks at the end of Section 4). Hence, under the conditions of Theorem 1.1, one 
may control the local fluctuations of X* (on average) , which typically deviate from their 
mean by not more than Co-( A+l ° sn ) 1/3 . 

Under a stronger hypothesis, such as (1-2), one can obtain more information about 
the fluctuations of F n (x) — F(x) for individual points x and thus get some control of the 
Kolmogorov distance. Similarly to the bound (1.3), such fluctuations will, on average, be 
shown to be at most 

(Mo)V3 

P n l/3 ' 

in the sense that E|F„(x) — F(x)\ < Cf3, where M is the Lipschitz scminorm of F (see 
Proposition 6.3). As for the Kolmogorov distance, we prove the following theorem. 

Theorem 1.2. Assume that F has a density, bounded by a number M . Under LSI(cr 2 ), 
for any r > ; 

P{||F„-F|| >r}< l c - c(r//3)3 . (1.5) 
r 

In particular, 

E||^ fl - < C/^log 1 / 3 (i + ^) > (1-6) 

where c and C are positive absolute constants. 

In both cases, the stated bounds arc of order n~ 1//3 up to a logn term with respect to 
the dimension n. Thus, they are not as sharp as in the classical i.i.d. case. Indeed, our 
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assumptions are much weaker and may naturally lead to weaker conclusions. Let us look 
at two examples illustrating the bounds obtained in the cases that essentially differ from 
the i.i.d. case. 

Example 1. Let be independent and uniformly distributed in the intervals (i — 1, i), 
i = 1, . . . ,n. Their joint distribution is a product measure, satisfying (1.1) and (1.2) with 
some absolute constant a. Clearly, F is the uniform distribution in (0,n) so M = ^ and 
(3 is of order — . As is easy to see, E||F„ — F\\ is also of order i that is, the bound (1.6) 
is sharp up to a log 1 / 3 n term. Also, since A is of order n, both sides of (1.3) are of order 
1. In particular, this shows that the quantity A cannot be removed from (1.3). 

Example 2. Let all Xi = £, where £ is uniformly distributed in [—1,1]. Note that all 
random variables are identically distributed with EA^ = 0. The joint distribution \x rep- 
resents a uniform distribution on the main diagonal of the cube [—1,1]", so it satisfies 
(1.1) and (1.2) with a = c^/n, where c is absolute. In this case, F is a uniform distribution 
on [—1, 1], so M = 1/2 and (3 is of order 1. Hence, both sides of (1.6) are of order 1. 

Next, we restrict the above statements to the empirical spectral measures F n of the n 
eigenvalues X\ < • ■ ■ < X n of a random symmetric matrix (— i=£j fc ), 1 < j, k < n, with 
independent entries above and on the diagonal (n > 2). Assume that E£jfc = and 
Var(£_,-fc) = 1 so that the means F = EF„ converge to the semicircle law G with mean zero 
and variance one. The boundedness of moments of of any order will be guaranteed 
by (1.1). 

Theorem 1.3. If the distributions of the £jk 's satisfy the Poincare-type inequality PI(cr 2 ) 
on the real line, then 

/ + OO S~t 
\F n (x)-F(x)\dx<-^, (1.7) 

where C is an absolute constant. Moreover, under LSI(<7 2 ), 

E||F„-G||<cf-J Iog^n+llF-GH. (1.8) 

By the convexity of the distance, we always have E||F„ — G|| > \\F — G\\. In some 
random matrix models, the Kolmogorov distance \\F — G\\ is known to tend to zero at 
rate at most n~ 2 / 3+e . For instance, it is true when the distributions of the £j,fc's have 
a non-trivial Gaussian component (see [22]). Hence, if, additionally, LSI(cr 2 ) is satisfied, 
then we get that for any e > 0, 



E||F„ - G\\ < C e . a n- 2 I^. 
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It is unknown whether this bound is optimal. Note, however, that in the case of Gaussian 
£j t k, the distance \\F — G\\ is known to be of order \jn [21]. Therefore, 

log 1/3 n 

E\\F n -G\\<C-^-, (1.9) 

which is a slight improvement of a bound obtained in [42]. In fact, as was recently shown 
in [8], we have \\F — G\\ < C a n~ 2 / 3 in the presence of the PI(er 2 )-hypothesis. Hence, the 
bound (1.9) always holds under LSI(cr 2 ) with constants depending only on a. 

It seems natural to try to relax the LSI(er 2 )-hypothesis in (1.8) and (1.9) to PI(er 2 ). 
In this context, let us mention a result of Chatterjee and Bose [14], who used Fourier 
transforms to derive from PI(er 2 ) a similar bound, 

Ca 1 / 4 

E F„-G < — — + 2\\F-G\\. 

As for (1.7), let us return to the two-sided bound (1.4) which holds with X* = Xj by 
the convention that the eigenvalues are listed in increasing order. The asymptotic be- 
havior of distributions of Xi with fixed or varying indices has been studied by many 
authors, especially in the standard Gaussian case. In particular, if i is fixed, while n 
grows, n 2 ' 3 (Xi — EXj) converges in distribution to (a variant of) the Tracy-Widom law 
so the E|Xj — ~EXi\ are of order n -2 / 3 . This property still holds when are symmetric 
and have sub-Gaussian tails; see [39] and [31] for the history and related results. Although 
this rate is consistent with the bound (1-7), the main contribution in the normalized sum 
(1.4) is due to the intermediate terms (in the bulk) and their rate might be different. 
It was shown by Gustavsson [25] for the GUE model that if ^ — ► 1 £ (0, 1), then Xi is 
asymptotically normal with variance of order c ^ l ° en . Hence, it is not surprising that 
EW 1 (F n ,F)< C ^ n n)1/ \ see [42]. 

The paper is organized as follows. In Section 2, we collect a few direct applications of 
the Poincare-type inequality to linear functionals of empirical measures. They are used in 
Section 3 to complete the proof of Theorem 1.1. In the next section, we discuss deviations 
of W\(F ni F) from its mean. In Section 5, we turn to logarithmic Sobolev inequalities. 
Here, we shall adapt infimum-convolution operators to empirical measures and apply a 
result of [5] on the relationship between infimum-convolution and log-Sobolcv inequalities. 
In Section 6, we illustrate this approach in the problem of dispersion of the values of the 
empirical distribution functions at a fixed point. In Section 7, we derive bounds on the 
uniform distance similar to (1.5) and (1.6) and give a somewhat more general form 
of Theorem 1.2. In Section 8, we apply the previous results to high-dimensional random 
matrices to prove Theorem 1.3 and obtain some refinements. Finally, since the hypotheses 
(1.1) and (1.2) play a crucial role in this investigation, we collect in the Appendix a few 
results on sufficient conditions for a measure to satisfy PI and LSI. 
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2. Empirical Poincare inequalities 

We assume that the random variables X\, . . . , X n have a joint distribution /i on R", 
satisfying the Poincare- type inequality (1.1). For a bounded smooth function / on the 
real line, we apply it to 

9 ( Xl ,..., Xn )= f ^ + - + f ^ = [fdF n , (2.1) 



where F n is the empirical measure, defined for the 'observations' X\ = x\, . . . ,X n = x n . 
Since 

w i \a /'(*i) 2 + ---+/w i r 2 

\yg{x 1 ,...,x n )\ = = = ~ J dF n: (2-2) 

n z n J 

we obtain an integro-diffcrcntial inequality, which may viewed as an empirical Poincare- 
type inequality for the measure \x. 

Proposition 2.1. Under PI(ct 2 ), for any smooth F -integrable function f on R such that 
f belongs to i 2 (R, dF), we have 



E 



JfdF n ~JfdF <^Jf 2 dF. (2.3) 



Recall that F = EF„ denotes the mean of the empirical measures. The inequality 
continues to hold for all locally Lipschitz functions with the modulus of the derivative, 
understood in the generalized sense, that is, |/'(x)| = lim sup^^. ^fe^p^ ■ As long as 
j f' 2 dF is finite, / f 2 dF is also finite and (2.3) holds. 

The latter may be extended to all L p -spaces by applying the following general lemma. 

Lemma 2.2. Under PI(ct 2 ), any Lipschitz function g on R" has a finite exponential 
moment: if J gd/j, = and ||<?||Lip < 1> then 

2 tg/a dpi< 0<t<2. (2.4) 

Moreover, for any locally Lipschitz g on R" with fi-mean zero, 

||fl|| P <op||Vff|| P) P >2. (2.5) 

More precisely, if |Vg| is in L p (fi), then so is g and (2.5) holds true with the standard 
notation ||.g|| p = (/ \g\P dfi) 1 ^ and ||Vs|| p = (/ | Vg\ p dp) 1 ^ for iP(^i)-norms. The prop- 
erty of being locally Lipschitz means that the function g has a finite Lipschitz scminorm 
on every compact subset of R" . 

In the concentration context, a variant of the first part of the lemma was first estab- 
lished by Gromov and Milman in [23] and independently in dimension 1 by Borovkov 
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and Utev [12]. Here, we follow [10], Proposition 4.1, to state (2.4). The second inequality, 
(2.5), may be derived by similar arguments; see also [11], Theorem 4.1, for an extension 
to the case of Poincare-type inequalities with weight. 

Now, for functions g = J f dF n as in (2.1), in view of (2.2), we may write 



|V 5 | p = 



IP/ 2 



p/2 

rdF n ) < 



\f'\ p dF n 



so that E M |Vg| p < ^ / \f'\ p dF. Applying (2.5) and (2.4) with t = 1, we obtain the 
following proposition. 

Proposition 2.3. Under PI(er 2 ), for any smooth function f on R such that f belongs 
to L p (R,dF), p>2, 



E 



fdF n 



fdF 



In addition, if |/'| < 1, for all h> 0, 



fdF n 



fdF 



>h\ <6c- nh ^. 



The empirical Poincare-type inequality (2.3) can be rewritten equivalcntly if we in- 
tegrate by parts the first integral as J f dF n — J fdF = — J f (x)(F n (x) — F(x)) dx. At 
this step, it is safe to assume that / is continuously diffcrcntiablc and is constant near 
—oo and +oo. Replacing /' with /, we arrive at 



E 



f{x){F n {x)-F{x))dx 



2 2 

<- / fdF 

n 



(2.6) 



for any continuous, compactly supported function / on the line. In other words, the 
integral operator Kf(x) = K(x,y)f(y)dy with a (positive definite) kernel 

K(x,y) = E(F n (x) - F(x))(F n (y) - F(y)) = cov(F n (x), F n (y)) 

is continuous and defined on a dense subset of L 2 (R, dF(x)), taking values in L 2 (R, dx). 
It has the operator norm \\K\\ < so it may be continuously extended to the space 
L 2 (R,dF) without a change of the norm. In the following, we will use a particular case 
of (2.6). 



Corollary 2.4. Under PI(er 2 ), whenever a <b, we have 



E 



(F n (x)-F(x))dx 



<—y/F(b)-F(a). 
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We shall now study the concentration properties of empirical measures F n around their 
mean F based on Poincare-type inequalities. In particular, we shall prove Theorem 1.1, 
which provides a bound on the mean of the Kantorovich-Rubinstein distance 



W x {F n ,F) 



+00 



|F„(a:)-F(a;)|da;. 



Note that it is homogeneous of order 1 with respect to the random vector (X\, . . . ,X n ). 
We first need a general observation. 

Lemma 3.1. Given distribution functions F and G, for all real a < b and a natural 
number N , 



[ b \F(x)-G(x)\dx<Y^ (F(x)-G(x))d 

J a Jon 



2(6 -a) 
N ' 



where au = a + (b — a) -4 . 

Proof. Let / denote the collection of those indices k such that in the kth subinterval 
Afc = (afc_i, a/j), the function ip(x) — F(x) — G(x) does not change sign. Let J denote 
the collection of the remaining indices. Then, for k € I, 



\F(x)-G(x)\dx: 



A* 



(F{x)-G(x))dx 



In the other case k £ J, since ip changes sign on A/., we may write 
sup \>p{x)\ < Osc Afc (» = sup (<p(x) - (p(y)) 

x£A k x,y£A k 

< Osc Afc (F) + Osc Afc (G) = F(A k ) + G(A fc ), 

where, in the last step, F and G are treated as probability measures. Hence, in this 
case, J A \F(x) — G(x)\dx < (F(Ak) + G(Afc))|Afc|. Combining the two bounds and using 

\A k \ = ^, we get that 



\F(x)-G(x)\dx 

(F(x) - G(x)) dx 



^E 

feel 

N 

^E 

k=l 



(F(x)-G(x))dx 



£(F(A fe ) + G(A fe ))|A* 



ke J 



b — a 



N 



N 



J2(F(A k ) + G(A k )). 



k=l 



□ 
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Remark. As the proof shows, the lemma may be extended to an arbitrary partition 
a = ap < a\ < ■ ■ ■ < a at = 6, as follows: 



N 



\F(x)-G(x)\dx<J2 



k=l 



(F(x)-G(x))dx 



2 max (a k - a fc _i). 

Kk<N 



Let us now apply the lemma to the space (R n , fx) satisfying a Poincare-type inequality. 
Consider the partition of the interval [a, b] with Ak = (a,k—Xi a k), as in Lemma 3.1. By 
Corollary 2.4, 

E [ b \F n (x)-F(x)\dx<J2v [ (F n (x)-F(x))dx + 

Ja fc=1 JA k ^ 

v fe=i 

By Cauchy's inequality, Y,Li V^i^ < ^(J2k=i F ( A k)) 1/2 < V^V, hence, 



E 



f \F n (x)-F(x)\dx< 

J a 



aVN 2(6 -a) 



Now, let us rewrite the right-hand side as -Z=(yJ~N + ^) with parameter c= 

and optimize it over iV. On the half-axis a; > 0, introduce the function ip( x ) = V% + ~ 
(c> 0). It has derivative ip'(x) = — therefore ijj is decreasing on (0, Xq] and is 

increasing on [xo,+oo), where xq = (2c) 2 / 3 . Hence, if c< |, we have 

inf ij}(N) = tp(l) = 1 + c < 1 + c 1/3 . 

If c > |, then the argmin lies in [1, +oo). Choose N = [x ] + 1 = [(2c) 2 / 3 ] + 1 so that 
N > 2 and N - 1 < x < N < x + 1. Hence, we get 



V>W < (V^ +!) + — = ! + ^(io) = 1 + 



•'•() 



2 2 /3 



.1/3 



Thus, in both cases, mf N ip(N) < 1 + ^rc 1/3 < 1 + 3(^^) 1/3 and we arrive at the 
following corollary. 



Corollary 3.2. Under PI(cr 2 ), /or aZZ a < 6, 

f6 



E 



/ |F n (a:)-F(x)|da!<-^= 

Ja V n 



1 + 3 



b — a 



1/3 
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The next step is to extend the above inequality to the whole real line. Here, we shall 
use the exponential intcgrability of the measure F. 

Proof of Theorem 1.1. Recall that the measure fj, is controlled by using two indepen- 
dent parameters: the constant a 2 and A, defined by 

\EXi-BXj\ < Aa, l<i,j<n. 

One may assume, without loss of generality, that —Aa < ~EXi < Aa for all i <n. 
Lemma 2.2 with g{x) — xi, t — 1 and Chebyshev's inequality give, for all h > 0, 

P{X t - EX, >h}< 3e~ h/a , P{Xi - < -h} < ic~ h/a . 

Therefore, whenever h > Aa, 

P{Xi >h}< 3e- (/l - A<T)/<T , P{Xi < -h} < ?,e- (h - Aa)/a . 

Averaging over all i's, we obtain similar bounds for the measure F, that is, 1 — F(h) < 
3e -(h-A<r)/<r and F(-h) < 3 C -C'-^)/^. After integration, we get 

/•+oo r — h 

1 (l-F(x))dx<3ac-( h - A < J V< 7 , / F(x)dx<3ac-^ h - Aa ^ a . 

h J — GO 

Using \F n (x) - F(x)\ < (1 - F n (x)) + (1 - F(x)) so that E\F n (x) - F(x)\ < 2(1 - F{x)), 
we get that 



r+oo 

E / \F n (x) - F(x)\dx<6ae 

Jh 



(h-Aa)/a 



and similarly for the half-axis (— oo, — h). Combining this bound with Corollary 3.2, with 
[a, b] = [—h, h], we obtain that, for all h > Aa, 



E 



/ + \F n (x)-F(x)\dx<-^= 



1 + 6 



1/3 



12ae- {h - Aa),a . 



Substituting h = (A + t)a with arbitrary t > 0, we get that 



/+oo 
\F n {x) - F{x)\dx < -= 



l + 6((A + t)y/7i) 1/3 + 12v^e 



Finally, the choice t — log n leads to the desired estimate 



/+oo 
\F n (x)~ F(x)\dx <Ca 
-oo 



A + log n 



1/3 



□ 
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4. Large deviations above the mean 

In addition to the upper bound on the mean of the Kantorovich-Rubinstein distance 
Wi(F n ,F), one may wonder how to bound large deviations of Wi(F n ,F) above the 
mean. To this end, the following general observation may be helpful. 

Lemma 4.1. For all points x = (x±, . . . ,x n ), x' = (x^, . . . ,x' n ) in R™, we have 

W^F^K-^Wx-x'W, 

where F n = <5xi + „ +<5x " , F' n ■ 



In other words, the canonical map T from R n to the space of all probability measures 
on the line, which assigns to each point an associated empirical measure, has a Lips- 
chitz scminorm <^ with respect to the Kantorovich-Rubinstein distance. As usual, the 
Euclidean space R™ is equipped with the Euclidean metric 

\\x - x'\\ = ^j\x 1 -x' 1 \ 2 + --- + \x n -x> n \ 2 . 

Denote by Z\ the collection of all (Borel) probability measures on the real line with 
finite first moment. The Kantorovich-Rubinstein distance in Z\ may equivalently be 
defined (cf. [16, 43]) by 



W 1 {G,G') = M J \u-u'\d7r(u,u'), 



where the infimum is taken over all (Borel) probability measures n on R x R with 
marginal distributions G and G' . In case of empirical measures G — F n , G f — F r n , as- 
sociated to the points x,x' £ R™, let ttq be the discrete measure on the pairs (2^,2^)1 
1 < i < n, with point masses — . Therefore, by Cauchy's inequality, 



Wi{Tx,Tx')< j | U - U '|d^oK«') = ^l^-^l<4^fe 

i—1 ^i— 1 



1/2 

„'|2 " 



This proves Lemma 4.1. 

Thus, the map T:R" — > Z\ has the Lipschitz seminorm |jT||Li P < As a conse- 
quence, given a probability measure fi on R™, this map transports many potential prop- 
erties of /x, such as concentration, to the space Z\, equipped with the Borel probability 
measure A = fiT^ 1 . Note that it is supported on the set of all probability measures with 
at most n atoms. In particular, if fi satisfies a concentration inequality of the form 

l-fJ,(A h )<a(h), h>0, (4.1) 
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in the class of all Borel sets A in R™ with measure n(A) > \ (where A h denotes an open 
Euclidean /i-neighborhood of A), then A satisfies a similar (and in fact stronger) property 

1- K{B h )<a{h^i), h>0, 

in the class of all Borel sets B in Z\ with measure A(B) > | (with respect to the 
Kantorovich-Rubinstein distance). In other words, an optimal so-called concentration 
function a = in (4.1) for the measure (i is related to the concentration function of A 

by 

a\(h) < a^hy/n), h>0. (4.2) 
Now, in general, the concentration function has a simple functional description as 

a^h) = sup ^{.9 - m(g) > h}, 

where the sup is taken over all Lipschitz functions g on R" with HdHijp < 1 and where 
m(g) stands for a median of g under fi. (Actually, this holds for abstract metric spaces.) 
The concentration function may therefore be controlled by Poincare-type inequalities 
in terms of a 2 (the Gromov-Milman theorem). Indeed, since the quantity g — m(g) is 
translation invariant, one may assume that g has mean zero. By Lemma 2.2 with t = 1, 
we get n{g < —ah} < 3c~ h < 5, provided that h > log 6, which means that any median 
of g satisfies m(g) > — crlog6. Therefore, again by Lemma 2.2, for any h > log 6, 

fi{g - m(g) > ah} < fi{g > a(h - log 6)} < 3 • 6 • e~ h 

so that 

a^(ah) < 18c"' 1 . (4.3) 

The latter also automatically holds in the interval < h < log 6. In fact, by a more careful 
application of the Poincare-type inequality, the concentration bound (4.3) may be further 
improved to a^(ah) < Ce~ 2h (see [3]), but this is not crucial for our purposes. 
Thus, combining (4.2) with (4.3), we may conclude that under PI(cr 2 ), 

a A (/i)<18e-' i ^ r/<T , h>0. 

Now, in the setting of Theorem 1.1, consider on Z\ the distance function g(H) = 
W\(H,F). It is Lipschitz (with Lipschitz seminorm 1) and has the mean Ea<7 = 
Efj,Wi(F n ,F) < a, where a = Ca( A+1 ° sn ) 1/3 . Hence, 771(5) < 2a under the measure A 
and for any h > 0, 

A{g >2a + h} < A{g - m(g) > h} < a A (h) < l8c~ hVri/,J . 

We can summarize as follows. 
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Proposition 4.2. If a random vector (X\, . . . ,X n ) in R™, n > 2, has distribution satis- 
fying a Poincare-type inequality with constant a 2 , then, for all h > 0, 



where A = — maxij \EXi — EXj\ and where C is an absolute constant. 

Bounds such as (4.4) may be used to prove that the convergence holds almost surely 
at a certain rate. Here is a simple example, corresponding to non-varying values of the 
Poincare constants. (One should properly modify the conclusion when applying this to 
the matrix scheme; see Section 7.) Let (X„)„>i be a random sequence such that for each 
n, (X±, . . . ,X n ) has distribution on R" satisfying PI(er 2 ) with some common a. 



Corollary 4.3. 7/max lJ <„ |EX t - EX,-| = O(logn), then Wx(F n ,F) = O(^) 1 / 3 with 



probability 1. 

Note, however, that in the scheme of sequences such as in Corollary 4.3, the mean 
distribution function F = E_F„ might also depend on n. 

By a similar contraction argument, the upper bound (4.4) may be sharpened, when 
the distribution of (X\, . . . ,X n ) satisfies a logarithmic Sobolev inequality. Wc turn to 
this type of (stronger) hypothesis in the next section. 

Remarks. Let f2 be a metric space and let d = d(u,u') be a non-negative continuous 
function on the product space il x f2. Given Borel probability measures G and G' on ft, 
the generalized Kantorovich-Rubinstcin or Wasscrstein 'distance' with cost function d is 
defined by 



where the infimum is taken over all probability measures it on Q x Q with marginal 
distributions G and G' . In the case of the real line ft = R with cost function of the form 
d(u, u') = tp(u — it'), where <p is convex, this quantity has a simple description, 



in terms of the inverse distribution functions G _1 (t) = min{x 6 R:G(x) > t}\ see, for 
example, [13] and [37], Theorem 2. 

If ip(u,u') = \u— u'\, then we also have the L 1 -representation for Wi(G,G'), which we 
use from the very beginning as our definition. Moreover, for arbitrary discrete measures 
G = F n = (S Xl + • • • + S Xn )/n and G' = F' n = (# x / + • • • + 5 X ' n )/n, as in Lemma 4.1, the 
expression (4.5) is reduced to 




(4.4) 





(4.5) 




(4.6) 



i=l 
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where we assume that x\ <•••<!„ and x\ < • • • < x' n . 

Now, for an arbitrary random vector X = (X\, . . . , X n ) in R™, consider the ordered 
statistics X* < ■ ■■ < X*. Equation (4.6) then yields 

1 " 

E^(F„,^) = -£e|X* - (*i)*l> ( 4J ) 

where (X()* < • • • < (X T 'J* are ordered statistics generated by an independent copy of X 
and where F' n are independent copies of the (random) empirical measures F n associated 
with X. By the triangle inequality for the metric W\, we have 

EW 1 (F n , F' n ) < EWi (F n , F) + (F, F' n ) = 2EWi (F n ,F). 

It is applied with the mean distribution function _F = E-F„. On the other hand, any 
function of the form H — > Wi(G,H) is convex on the convex set Z-y, so, by Jensen's 
inequality, EWi(F n ,F^) > EWi(F„,Ei^) = EW±(F n ,F). The two bounds give 

EW!(F n ,F) < EWi(F n ,J^) < 2EWi(F n ,J?). (4.8) 

By a similar argument, 

E|X* - El* | < E|X* - (X,')* | < 2E|X* - EX* | . (4.9) 

Combining (4.8) and (4.9) and recalling (4.7), we arrive at the two-sided estimate 

J- ^ E|X* - EX* | < EWi (F n ,F)<-J2 E|X* - EX* | , 

i=l i=l 

which is exactly the inequality (1.4) mentioned in the Introduction. Similar two-sided 
estimates also hold for other cost functions in the Wasserstein distance. 



5. Empirical log-Sobolev inequalities 

As before, let (Xi, . . . ,X„) be a random vector in R n with joint distribution fi. Similarly 
to Proposition 2.1, now using a log-Sobolev inequality for [i, we arrive at the following 
'empirical' log-Sobolev inequality. 

Proposition 5.1. Under LSI(tr 2 ) ; for any bounded, smooth function f on R, 



Ent, 



fdF n 



<— I 

n 



In analogy with Poincarc-type inequalities, one may also develop refined applications to 
the rate of growth of moments and to large deviations of various functionals of empirical 
measures. In particular, we have the following proposition. 
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Proposition 5.2. Under LSI(cr 2 ), for any smooth function f on R such that f belongs 
to LP(R,dF), p>2, 



E 



fdF n - / fdF 



P < [ -^l\f7dF. 



In addition, if \ f'\ < 1, then, for all h > 0, 

/dF„- //dF 



> h [ < 2e~ nh2 / 2 * 2 



(5.1) 



(5.2) 



The proof of the second bound, (5.2), which was already noticed in [24] in the context 
of random matrices, follows the standard Herbst's argument; see [29] and [6]. The first 
family of moment inequalities, (5.1), can be sharpened by one inequality on the Laplace 
transform, such as 



E 



expjy fdF n - J /dF|<Eexp|^- J |/'| 2 dF„|. 



The proof is immediate, by [6], Theorem 1.2. 

However, a major weak point in both Poincare and log-Sobolev inequalities, including 
their direct consequences, as in Proposition 5.2, is that they may not be applied to indi- 
cator and other non-smooth functions. In particular, we cannot estimate directly at fixed 
points the variance Vai(F n (x)) or other similar quantities like the higher moments of 
\F n {x) — F(x)\. Therefore, we need another family of analytic inequalities. Fortunately, 
the so-called infimum-convolution operator and associated relations concerning arbitrary 
measurable functions perfectly fit our purposes. Moreover, some of the important rela- 
tions hold true and may be controlled in terms of the constant involved in the logarithmic 
Sobolev inequalities. 

Let us now turn to the important concept of infimum- and supremum-convolution 
inequalities. They were proposed in 1991 by Maurey [33] as a functional approach to some 
of Talagrand's concentration results concerning product measures. Given a parameter 
t > and a real- valued function g on R™ (possibly taking the values ±oo), put 



Qtg(x) = inf 



P t g(x) = sup 
yen." 



9{v) + -^\\x-y\\ 



g(y) ~ 7^\\ x - y\\ 2 



Q t g and Ptg then represent, respectively, the infimum- and supremum-convolution of 
g with cost function being the normalized square of the Euclidean norm in R™. By 
definition, one puts Qog = Pog = g. 

For basic definitions and basic properties of the infimum- and supremum-convolution 
operators, we refer the reader to [19] and [5], mentioning just some of them here. These 
operators are dually related by the property that for any functions / and g on R", 
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g > P t f / < Qtg- Clearly, P t (—g) = —Qtg- Thus, in many statements, it is sufficient 
to consider only one of these operators. The basic semigroup property of both operators 
is that for any g on R" and t, s > 0, 

Qt+ S g = QtQ s g, Pt+ S g = PtP s g- 

For any function g and t > 0, the function P t g is always lower semicontinuous, while 
Qtg is upper semicontinuous. If g is bounded, then P t g and Qtg are bounded and have 
finite Lipschitz scminorms. In particular, both arc diffcrcntiable almost everywhere. 

Given a bounded function g and t > 0, for almost all x € R™, the functions t — > Ptg(x) 
and t — > Qtg(x) are diffcrcntiable at t and 



at 



i 



VPt 5 (x)|p 



dQtg(x) 
dt 



1 



VQ t 5(x)|| 



In other words, the operator Tg = ^ | V<7| 2 appears as the generator for the semigroup P t , 
while — r appears as the generator for Q t . As a result, u(x,t) = Q t g{x) represents the 
solution to the Hamilton-Jacobi equation ^ = — i||Vu|| 2 with initial condition u(x,Q) = 

Below, we separately formulate a principal result of [5] which relates logarithmic 
Sobolcv inequalities to suprcmum- and infimum-convolution operators. 



Lemma 5.3. Let fj, be a probability measure on R" satisfying LSI(er 2 ). For any \i- 
integrable Borel-measurable function g on R" , we have 



P a 2gdfi>\og / e 9 d^i 



and, equivalently, 



gd/j,>\og / e Q - 29 dfi 



(5.3) 



(5.4) 



Alternatively, for further applications to empirical measures, one could start from the 
infimum-convolution inequalities (5.3) and (5.4), taking them as the main hypothesis on 
the measure /i. They take an intermediate position between Poincare and logarithmic 
Sobolcv inequalities. However, logarithmic Sobolcv inequalities have been much better 
studied, with a variety of sufficient conditions having been derived. 

Now, as in Section 2, we apply the relations (5.3) and (5.4) to functions g[x\, . . . ,x n ) — 
J f dF n , where F n is the empirical measure defined for 'observations' x\,...,x n . By the 
very definition, for any t > 0, 



P t g(xi,...,x n ) = sup 

y 1 ,...,y n eR 



1 " 

g(yi,...,y n )- ^X^^^l 

i=l 



1 

- sup V 

" .7: • " f 



f(Vi) 



2t/n 



\x% - ViY 



= fp t/n fdF n . 
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Similarly, Q t g = J Q t / n f dF n . Therefore, after integration with respect to \i and using the 
identity P a (tf) = tP ta f, we arrive at corresponding empirical supremum- and infimum- 
convolution inequalities, as follows. 

Proposition 5.4. Under LSI(er 2 ), for any F-integrable Borel-measurable function f on 
R and for any t>0, 

logEe ta/dF„-// d F) < t J [Pta2/nf _ f]dFj (5. 5) 

logEetUVdf-.r/dfW) < t J [f- Qta2/n f] dF . (5. 6 ) 
Note that the second inequality may be derived from the first by changing / to — /. 

6. Local behavior of empirical distributions 

In this section, we develop a few direct applications of Proposition 5.4 to the behavior 
of empirical distribution functions F n (x) at a fixed point. Such functionals are linear, 
that is, of the form J fdF n , corresponding to the indicator function of the half-axis 
/ = (— oo, x]. When / is smooth, Proposition 5.2 tells us that the deviations of L n f = 
J f dF n — J f dF arc of order a/y/n. In the general non-smooth case, the infimum- and 
supremum-convolution operators Ptf and Qtf behave differently for small values of t 
and this results in a different rate of fluctuation for L n f. 

To see this, let / = (— oo,x]. In this case, the functions Ptf and Qtf may easily be 
computed explicitly, but we do not lose much by using the obvious bounds 

l(-oo,x-V2t] — — — l (~oo,x+V2i]- 

Therefore, (5.5) and (5.6) yield the following proposition. 



Proposition 6.1. Under LSI(cr 2 ), for any i£R and t > 0, with h = y ^p-, 

logEe t(F " (a;) - F(:i;)) < t(F{x + h)- F(x)), (6.1) 
]ogEe*W x) - F "( x) ) < t(F{x) - F(x - h)). (6.2) 

These estimates may be used to sharpen Corollary 3.2 and therefore to recover Theo- 
rem 1.1 (under the stronger hypothesis on the joint distribution fi, however). Indeed, for 
any t > 0, 

Ee t|-F»(*)-.F(s)| < Ee t(F n (x)-F(x)) + - Ee -t(F(x)-F n (x)) < 2e *(*'(z+/0- F (z-'0) . 

Taking the logarithm and applying Jensen's inequality, we arrive at 

log 2 



E\F n (x) - F(x)\ < {F(x + h)- F(x - h)) + 



t 
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Now, just integrate this inequality over an arbitrary interval (a, 6), a <b, and use the 
general relation J^^(F(x + h) — F(x — h)) dx < 2h to obtain that 



E f b \Fjx) - F(x)\dx <2h + l ^(b-a) = 2\f^- + ^(6 - a). 
J a t V n t 

Optimization over t leads to an improved version of Corollary 3.2. 
Corollary 6.2. Under LSI(ct 2 ), for all a < b, 



E f \F n {x)-F{x)\dx<± 

J a 



2/l \ \ 1/3 

a z (b — a) N 



Note that in both cases of Proposition 6.1, for any t € R, 



logEc*^^-^^ < \t\(F(x + h)-F(x-h)), h=^2o 2 \t\/n. 

Hence, the local behavior of the distribution function F near a given point x turns out to 
be responsible for the large deviation behavior at this point of the empirical distribution 
function F n around its mean. 

For a quantitative statement, assume that F has a finite Lipschitz constant M = 
1 1 F 1 1 Lip, so it is absolutely continuous with respect to Lebesgue measure on the real 
line and has a density, bounded by M. It follows from (6.1), with t = (an l ' 3 )\ and 
a 3 = that 

where £ = an 1 / 3 (F„(a;) — F(x)). By Chebyshev's inequality, for any r > 0, 

>r}< e^ x3/2 - Xr = c- r3 / 3 , where A = r 2 . 

Similarly, < -r] < c" r3 / 3 . Therefore, (i{an^ 3 \F n (x) - F(x)\ >r}< 2c~ r3 / 3 . Chang- 
ing the variable, we are finished. 
Recall that we use the quantity 

. (Ma) 2 / 3 

P n l/3 ■ 

Proposition 6.3. Assume that F has a density, bounded by a number M . Under 
LSI(cr 2 ), for any x € R and r > 0, 

P{\F n (x) - F(x)\ > (3r} < 2c- 2r3 / 27 . (6.3) 
In particular, with some absolute constant C , we have 

E\F n (x)-F(x)\<C(3. (6.4) 
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Note that (6.4) is consistent with the estimate of Theorem 1.1. To derive similar bounds 
on the uniform (Kolmogorov) distance — F|| = sup,,, \F n (x) — F(x)\ (which wc discuss 
in the next section), it is better to split the bound (6.3) into the two parts, 

P{F n {x) - F(x) > pr} < e" 2r3/27 , (6.5) 
P{F(x) - F n (x) > M < e- 2r3/27 , (6.6) 

which were obtained in the last step of the proof of (6.3). 

However, since one might not know whether F is Lipschitz or how it behaves locally, 
and since one might want to approximate this measure itself by some canonical distribu- 
tion G, it is reasonable to provide a more general statement. By Proposition 6.1, for any 
t>0, 

logF,e tiF " (x) - G{x)) < t(F(x + h)- G(x)) 

< t{G(x + h)- G(x)) +t\\F- G\\ 

and, similarly, 

lQgEe t < G < x >- p »< x » < t(G(x) - G(x ~ h)) +t\\F- G\\. 
Repeating the preceding argument with the random variable 

£ = an x /*{F n {x) - G(x) - \\F - G\\) 
and then interchanging F n and G, we get a more general version of Proposition 6.3. 

Proposition 6.4. Under LSI(cr 2 ), for any distribution function G with finite Lipschitz 
seminorm M = ||G||Li P , for any x £ R and r > ; 

P{\F n (x) - G(x)\ >(3r + \\F-G\\}< 2c~ 2r3 / 27 , 

where f3 = (M<r) 2/ ' 3 n _1 / 3 . In particular, up to some absolute constant C, 

E\F n (x)-G(x)\<Cp+\\F-G\\. (6.7) 

Let us stress that in all of these applications of Proposition 5.4, only the indicator 
functions / = l(_ 00ia; ] were used. One may therefore try to get more information about 
deviations of the empirical distributions F n from the mean F by applying the basic 
bounds (5.5) and (5.6) with different (non-smooth) functions /. 

For example, of considerable interest is the so-called local regime, where one tries to 
estimate the number 

Nj = card{z < n : X l G /} 
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of observations inside a small interval I = [x, x + e] and to take into account the size of 
the increment e = \I\. In case of i.i.d. observations, this may done using various tools; 
already, the formula 

Var(F n (/)) = -F(I)(1 F(I)) < F ( X + £ )~ F W 
n n 

suggests that when F is Lipschitz, F n (I) has small oscillations for small e (where F n and 
F are treated as measures). 

However, the infimum- and supremum-convolution operators Ptf and Qtf do not pro- 
vide such information. Indeed, for the indicator function / = 1/, by (5.5), we only have, 
similarly to Proposition 6.1, that 

logEe t(F " (/) - F(7)) <t[(F(x + e + h) - F(x)) - (F(x) - F(x - h))], 



where t > and h = y^ 2 ^- Here, when h is fixed and e-)0, the right-hand side is 
not vanishing, in contrast with the i.i.d. case. This also shows that standard chaining 
arguments, such as Dudley's entropy bound or more delicate majorizing measure tech- 
niques (described, e.g., in [40]), do not properly work through the infimum-convolution 
approach. 

Nevertheless, the above estimate is still effective for e of order h, so we can control 
deviations of F n (I) — F(I) relative to |/| when the intervals are not too small. This can be 
done with the arguments used in the proof of Proposition 6.3 or, alternatively (although 
with worse absolute constants), one can use the inequality (6.3), by applying it to the 
points x and x + e. This immediately gives that 

P{\F n (I) - F(I)\ > 2pr] < Ae' 2r ' 3 / 27 . 

Changing variables, one may rewrite the above in terms of Nj as 

P{|JV> - nF(I)\ > nS\I\} < 4cxp|-c 

with c = 1/112. Note that the right-hand side is small only when |/| ^> (3/6, which is of 
order n" 1 / 3 with respect to the number of observations. 
This can further be generalized if we apply Proposition 6.4. 




Corollary 6.5. Let G be a distribution function with density g(x) bounded by a number 
M . Under LSI(cr 2 ), for any S > and any interval I of length |/| > 4\\F — G\\/5, 



Nj — n I g(x) dx 



> n6\I\ > < 4exp< -c 



S\I\ 



where j3 = (Ma) 2 ^ 3 n 1 / 3 and c > is an absolute constant. 
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Hence, if 



|/|> T max{/3,||F-G||} 



and C > is large, then, with high probability, we have that |^ — Jjg(x) dx\ < S\I\. 

7. Bounds on the Kolmogorov distance. Proof of 



As before, let F n denote the empirical measure associated with observations Xi,...,x n 
and F = EF„ their mean with respect to a given probability measure [i on R™ . In this 
section, we derive uniform bounds on F n (x) — F(x), based on Proposition 6.3, and thus 
prove Theorem 1.2. For applications to the matrix scheme, we shall also replace F, which 
may be difficult to determine, by the well-behaving limit law G (with the argument relying 
on Proposition 6.4). 

Let the random variables X±, . . . , X n have joint distribution satisfying LSI(<7 2 ), and 
assume that F has a finite Lipschitz scminorm M = ||-F||Lip- Define 



Proof of Theorem 1.2. We use the inequalities (6.5) and (6.6) to derive an upper 
bound on \\F n — F\\ = s\rp x \F n {x) — F(x)\. (For the sake of extension of Theorem 1.2 to 
Theorem 7.1 below, we relax the argument and do not assume that F is continuous.) 

So, fix r > and an integer N > 2. One can always pick up points — oo = xq < x\ < 
• • • < xjv-i < xn = +oo with the property that 



Note that F n (xo) = F n (xo) = and F u {xn) = F n (xiy) = 1- It then follows from (6.5) that 



Theorem 1.2 



(Ma) 2 / 3 
nV3 



F(xi-) — F(xj_i) < — , 



i = l,...,N. 



(7.1) 




and, similarly, by (6.6), 




Hence, for the random variable 




we have that 



P{6v > M < 2(N - l)e- 2r3 / 27 . 



(7.2) 
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Now, take any point x € R different from all of the Xj 's and select i from 1, . . . , n such 
that Xi-i < x < Xi. Then, by (7.1), 

F n (x) - F(x) < F n { Xi -) - F{x i - 1 ) 

= [F n ( Xi -) - F( Xi -)} + [Fix,-) - < £ N + -i. 

Similarly, 

F(x) - F n (x) < F{ Xi -) - F^Xi-r) 

= [Fix,-) - + [F(Xi-!) - Fnfc-i)] < + jj. 

Therefore, \F n (x) — F(x)\ <£,n + jj, which also extends by continuity from the right to 
all points Xj. Thus, ||F„ - F|| < £ N + i and, by (7.2), 

p|||F„ -F\\>pr+±\<2(N- l)e- 2r3 / 27 . 

Note that this also holds automatically in the case N = 1. Choose N = [^] + 1. We then 
have if < f3r and get 

P{j|F n -F||>2M<^c- 2 '- 3 / 27 . 

pr 

Finally, changing 2/3r into r, we arrive at the bound (1.5) of Theorem 1.2, 

P{\\F n F\\ > r} < Jcxpj- A Q y (7.3) 

and so the constant c = 2/27. 

It remains to derive the bound on the mean E||.F„ — F\\. Given < ro < 1, we can 
write, using (7.3), 

/•l pro rl 

E\\F n -F\\= f i{\\F n -F\\>r}dr= + 

(7.4) 



- ro + ^ CXP l 27V/? 

This bound also holds for ro > 1. 

First, assume that < j3 < 1 and choose ro = 3/31og(l + -j). Then, for the last term in 
(7.4), we have 



f 2 



ro eXP l 27 J 3/31og(l + l//3) e 



-21og(l+l//3) 
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= 4 p 

3 (l + /3)2 10^/3(1 + 1/^) 

<B01og 1/3 (l+^ 

with some constant B satisfying (1 + /3) 3 log(l + 4) > (jg) 3 ^ 2 - For example, we can take 
B = 2 and then, by (7.4), we have 

E||F„-F|| Kb/ilog^^l + ^j. (7.5) 

As for the values /3 > 1, simple calculations show that the right-hand side of (7.5) is 
greater than 1, so the inequality (1.6) is fulfilled with C = 5. 

Theorem 1.2 is therefore proved. □ 



Remark. If Ma in Theorem 1.2 were of order 1, then E||f n — F\\ would be of order at 
most (i^) 1 / 3 . Note, however, that under PI(cr 2 ), and if all EX, = 0, the quantity Ma- 
is separated from zero and, more precisely, Ma > . 

Indeed, by Hensley's theorem in dimension 1 [1, 26], in the class of all probability 
densities p(x) on the line, the expression (J x 2 p(x) dx) 1 / 2 esssup x p(x) is minimized for 
the uniform distribution on symmetric intervals and is therefore bounded from below 
by l/-\/l2. Since F is Lipschitz, it has a density p with M = esssup a ,p(a;). On the other 
hand, it follows from the Poincare-type inequality that a 2 > Var(X^) = EXf. Averaging 
over all i's, we get a 2 > Jx 2 dF(x), so Ma> (J x 2 p(x) dx) 1 / 2 esssup x p(x). 

With similar arguments based on Proposition 6.4, we also obtain the following gener- 
alization of Theorem 1.2. 



Theorem 7.1. Assume that X\, . . . ,X n have a distribution on R" satisfying LSI(<7 2 ). 
Let G be a distribution function with finite Lipschitz seminorm M . Then, for all r > 0, 

P{||i^-G||>r+||^-G||}<^exp|-AQ \ 

where (3 = (Mcr) 2 / 3 n _1 / 3 . In particular, 

E||F„-G|| <5/31og 1/3 fl + i) +\\F-G\\. 



Remarks. It remains unclear whether or not one can involve the i.i.d. case in the scheme 
of Poincare or logarithmic Sobolev inequalities to recover the rate 1/y/n for EHi 7 ^ — F\\ , 
even if some further natural assumptions are imposed (which are necessary, as we know 
from Examples 1 and 2). In particular, one may assume that the quantities M and a 
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are of order 1, and that EX; = 0, Var(Ai) = 1. The question is the following: under, say, 
LSI(u 2 ), is it true that 

E||F„-F||<^= 

with some absolute CI Or at least ~E\F n (x) — F{x)\ < for individual points? 



8. High-dimensional random matrices 



We shall now apply the bounds obtained in Theorems 1.1 and 7.1, to the case of the 
spectral empirical distributions. Let {^jk}i<j<k<n be a family of independent random 
variables on some probability space with mean E£j/j = and variance Var(£j/j) = 1. Put 
£jk = £,kj f° r I < k < j < n and introduce a symmetric n x n random matrix, 





(f 


£l2 


• • • Cm \ 


1 


?21 


^22 


• ■ ■ &n 














£n2 





Arrange its (real random) eigenvalues in increasing order: X\ < • • ■ < X n . As before, we 
associate with particular values X\ =x\,...,X n — x n an empirical (spectral) measure 
F n with mean (expected) measure F = EF„ . 

An important point in this scheme is that the joint distribution /x of the spectral values, 
as a probability measure on R", represents the image of the joint distribution of £jfc's 
under a Lipschitz map T with Lipschitz seminorm ||T||Lip = ^t=- More precisely, by the 
Hoffman- Wielandt theorem with respect to the Hilbcrt-Schmidt norm, we have 

n -. n „ 

for any collections {£jk}j<k and {£jk}'j< k with eigenvalues (Ai,...,A„), (X[, . . . , X' n ), 
respectively. This is a well-known fact ([2], page 165) which may be used in concentration 
problems; see, for example, [15, 30]. 

In particular (see Proposition Al in the Appendix), if the distributions of £jfc's satisfy 
a one-dimensional Poincare-type inequality with common constant er 2 , then /i satisfies 
a Poincare-type inequality with an asymptotically much better constant ct 2 = =^—. Ac- 
cording to Theorem 1.1, 

E | + °° \F n (x) - F(x)\dx < Co n + lQgn ) , 
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where C is an absolute constant and A n = ^maxjj |EXj — EXj|. Since max^ |EXj| is 
of order at most a, A n is at most ^/n and we arrive at the bound (1.7) in Theorem 1.3: 



/+oo 
\F n {x)-F{x)\Ax< 
-OO 



Co 

n 2/3 ■ 



Now, let us explain the second statement of Theorem 1.3 for the case where the £jfc's 
satisfy a logarithmic Sobolev inequality with a common constant <r 2 , in addition to the 
normalizing conditions E^-fe = 0, Var(£jfc) = 1 (which implies that a > 1). Let G denote 
the standard semicircle law with variance 1, that is, with density g(x) = ^= \A4 — x 2 , 
—2 < x < 2. In this case, the Lipschitz seminorm is M = ||G||Li P = ^- Also, 

_ (Ma n )^ _ fa\ 2/3 

Pn - — ~aTz — -Mr 



for some absolute C". Therefore, applying Theorem 7.1 and using a > 1, we arrive at the 
bound (1.8): 

, loo- 1 / 3 n 

E sup \F n (x) - G(x)\ < Ca 2 / 3 ^— + sup \F(x) - G(x)\. (8.1) 

Thus, Theorem 1.3 is proved. For individual points that are close to the end-points 
x = ±2 of the supporting interval of the semicircle law, we may get improved bounds 
in comparison with (8.1). Namely, by Proposition 6.1 (and repeating the argument from 
the proof of the inequality of Corollary 6.2), for all t > 0, 

Ee t\F n (x)-G(x)\ < Ee t(F»(x)-G(x)) + - Ee -t(F n (x)-G(x)) 
<■ e t(F(x+h)-G(x)) + e t(G(x)-F(x-h)) 
< 2 e t(G(x+h)-G(w-h))+t\\F-G\\ 



where h = y = . Taking the logarithm and applying Jensen's inequality, we 
arrive at 

E\F n (x) - G(x)\ < \\F- G\\ + (G{x + h) - G(x - h)) + (8.2) 

Using the Lipschitz property of G only (that is, G(x + h) — G(x — h) < ^) would yield 
the previous bound, such as the one in the estimate (6.7) of Proposition 6.4, 

\ 2/3 
(7 X 



E\F n (x)-G(x)\<\\F-G\\+C^-J . (8.3) 

However, the real size of increments G{x + h) — G(x — h) with respect to the parameter 
h essentially depends on the point x. To be more careful in the analysis of the right-hand 
side of (8.2), we may use the following elementary calculus bound, whose proof we omit. 
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Lemma 8.1. G(x + h) - G(x - h) < 2g(x)h + ^h 3 ' 2 for all x £ R and h>0. 

Since G is concentrated on the interval [—2,2], for |x| > 2, we have a simple bound 
G(x + h)-G(x-h) < gf^ 372 - As aresult, one may derive from (8.2) an improved variant 
of (8.3). In particular, if \x\ > 2, then 

/ \ 6/7 

E\F n (x)-G(x)\<\\F-G\\+c(^j . 
The more general statement for all x G R is given by the following result. 



Theorem 8.2. Let (1 <j < k< n) be independent and satisfy a logarithmic Sobolev 
inequality with constant a 2 , with ~E£,jk = and Var(^fc) = 1. For all x G R, 



E\F n (x)-G(x)\<\\F-G\\+C 
where C is an absolute constant. 



6/7 



2/3 



2/3' 



(8.4) 



A similar uniform bound may also be shown to hold for Esup y<:r \F n (y) — G(y)\ (x < 0) 
and Esup, y>:1 . \F n (y) — G(y)\ (x > 0). Note that in comparison with (8.3), there is an 
improvement for the points x at distance not more than (£) 4 A from ±2. 

Proof of Theorem 8.2. According to the bound (8.2) and Lemma 8.1, for any h > 0, we 
may write E\F n (x) - G(x)\ < \\F - G\\+3cp(h), where tp(h) = g{x)h + h 3 ' 2 + -fc, e = (^) 2 . 

We shall now estimate the minimum of this function. Write h = (j^) 2 ^ 7 with param- 
eter a > to be specified later on. If g{x) < a^/h, then 

tp(h) < (1 + a)h 3 ' 2 + -1 = 2(1+ afl 7 e 3 ' 7 . (8.5) 

Note that the requirement on g(x) is equivalent to 9 ^ X J < ^bj- Thus, we set A = a ^ x ) 
and take a = 1 + 2A 1 ^ 6 . Since a > 1, we get y^j— - >\> A. Hence, we may apply (8.5). 
Using (1 + a) 4 / 7 < (2a) 477 and a 4 / 7 < 1 + ((2A) 1 / 6 ) 4 / 7 = 1 + (2A) 2 / 21 , we finally get that 

<p(h) < 2 • 2 477 (1 + (2A) 2 / 21 )e 3 / 7 < 4(e 3 / 7 + A 2 / 21 e 3 / 7 ). 

This is the desired expression in square brackets in (8.4) and Theorem 8.2 follows. □ 



Finally, let us comment on the meaning of the general Corollary 6.5 in the matrix 
model above. To every interval I on the real line, we associate the number 



Ni = card{i < n : X t e 1} 
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of eigenvalues Xi inside it. Again, Corollary 6.5 may be applied to the standard semicircle 
law G with density g(x), in which case (3 = C(^) 2 ' 3 . This gives that, under LSI(ct 2 ), 
imposed on the entries for any 5 > and any interval I of length |/| > 4||F — G\\/S, 
we have that 



— - / g(x) dx 



<S\I\ 



(8.6) 



with probability at least 1 — 4exp{— — 
PI(cr 2 ), one can show that \\F - G\\ < Cn~ 2 / 3 
true with high probability, provided that 



/| 3 }. As we have already mentioned, under 



Theorem 1.1). Therefore, (8.6) holds 



|/| > Cn-' 2 / 3 /S 



(8.7) 



with large C (of order, say, log e n). 

Such properties have been intensively studied in recent years in connection with the 
universality problem. In particular, it is shown in [18] and [41] that the restriction (8.7) 
may be weakened to |/| > C £ (log" n)/n under the assumption that the intervals I are 
contained in [—2 — e, 2 + e], e > 0, that is, 'in the bulk'. 



Appendix 

Here we recall some facts about Poincare-type and log-Sobolev inequalities. While Lem- 
mas 2.2 and 5.3 list some of their consequences, one might wonder which measures ac- 
tually satisfy these analytic inequalities. Many interesting examples can be constructed 
with the help of the following elementary proposition. 



Proposition Al. Let /Lti, . . . , /Ltjy be probability measures on R satisfying PI(ct 2 ) (resp., 



LSI(<7 ) ). The image [i of the product measure fj,i 
with finite Lipschitz seminorm satisfies PI(cr 2 ||T 



12 ^ 
Lip; 



<E> Hn under any map T : R N 
(resp., LSI(a 2 ||T|| 2 ip )J. 



R" 



On the real line, disregarding the problem of optimal constants, Poincare-type inequal- 
ities may be reduced to Hardy- type inequalities with weights. Necessary and sufficient 
conditions for a measure on the positive half-axis to satisfy a Hardy-type inequality with 
general weights were found in the late 1950s in the work of Kac and Krein [27]. We refer 
the interested reader to [35] and [34] for a full characterization and an account of the 
history; here, we just recall the principal result (see also [7]). 

Let n be a probability measure on the line with median m, that is, oo, m) < ^ and 
/i(m, +oo) < i. Define the quantities 



A (/j,) = sup 

x < rri 



A\{n) = sup 

x > rn 



//(— oo, x) 
/x(ar,+oo) 



X 



At 



oo Putt) 
+°° dt 



P/x(*) 
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where p M denotes the density of the absolutely continuous component of \x (with respect 
to Lebesgue measure) and where we set Aq = (resp., A\ = 0) if fi(— oo,m) = (resp., 
H(m, +oo) = 0). We then have the following proposition. 

Proposition A2. The measure [i on R satisfies PI(er 2 ) with some finite constant if and 
only if both Aq(ii) and Ai(/j) are finite. Moreover, the optimal value of a 2 satisfies 

c (4)(m) + A i (m)) <°- 2 <a {A (At) + M (/i)), 

where cq and c\ are positive universal constants. 

Necessarily, \i must have a non-trivial absolutely continuous part with density which 
is positive almost everywhere on the supporting interval. 

For example, the two-sided exponential measure (Xq, with density ie - '*', satisfies 
PI(cr 2 ) with a 2 = 4. Therefore, any Lipschitz transform fj, = hqT^ 1 of /xq satisfies PI(cr 2 ) 
with a 2 = 4||T||Li p . The latter property may be expressed analytically in terms of the 
reciprocal to the so-called isoperimetric constant, 

ff(/i) = ess inf ^ 



where F^x) = fi(—oo, x] denotes the distribution function of /x and p^ the density of its 
absolutely continuous component . Namely, as a variant of the Mazya-Cheeger theorem, 
we have that PI(o" 2 ) is valid with a 2 = 4/i?(^i) 2 ; see [9], Theorem 1.3. 

To roughly describe the class of measures in the case, where [i is absolutely continuous 
and has a positive, continuous well-behaving density, one may note that H(n) and the 
Poincare constant are finite, provided that the measure has a finite exponential moment. 
In particular, any probability measure with a logarithmically concave density satisfies 
PI(er 2 ) with a finite er; see [4]. 

As for logarithmic Sobolev inequalities, we have a similar picture, where the standard 
Gaussian measure represents a basic example and plays a similar role as the two-sided 
exponential distribution for Poincare-type inequalities. A full description on the real line, 
resembling Proposition A2, was given in [6]. Namely, for one-dimensional probability 
measure fj,, with previous notation, we define the quantities 



-Bo(aO = SU P 



A*(— oo, x) log 



i r At 



Bi(fi) = sup 



fi(-oo,x) J^^ p^(t) 
1 [ + °° At 



>(x,+oo) J x p^{i) 



fi(x, +oo) log 
We then have the following proposition. 

Proposition A3. The measure /i on R satisfies LSI(<7 2 ) with some finite constant if 
and only if Bq(^i) and Bi(fj,) are finite. Moreover, the optimal value of a 2 satisfies 

coOB (aO + < o- 2 < ci(Bo(A«) + Bi(p)), 
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where cq and c\ are positive universal constants. 

In particular, if /i has a log-concave density, then LSI(er 2 ) is satisfied with some finite 
constant if and only if fi has sub-Gaussian tails. 
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